Skip to content

Fix critical bugs from comprehensive review + PRS-CS Python validation#464

Merged
gaow merged 4 commits intomainfrom
fix-review-bugs
Apr 7, 2026
Merged

Fix critical bugs from comprehensive review + PRS-CS Python validation#464
gaow merged 4 commits intomainfrom
fix-review-bugs

Conversation

@gaow
Copy link
Copy Markdown
Contributor

@gaow gaow commented Apr 7, 2026

Summary

Fix 3 critical bugs found in comprehensive code review, plus validation of PRS-CS against the original Python implementation (getian107/PRScs).

Bug fixes

1. lassosum_rss() lambda reorder bug (Critical)

result$beta[, order] <- result$beta was a self-assignment that silently produced wrong results. The lambda path is solved in descending order for warm-starting, then "reordered" back — but the reorder was a no-op. Fixed to use inverse permutation via order(order).

2. SDPR thread-safety (Documentation)

sample_assignment() uses a shared std::mt19937 across threads when n_threads > 1, causing a data race. This matches the original SDPR (eldronzhou/SDPR) which has the same issue with gsl_rng. The default n_threads=1 avoids the race. Documented with a clear warning comment.

3. check_ld eigenfix PSD vs PD (Medium)

eigenfix was setting negative eigenvalues to 0, producing a PSD (not PD) matrix. PRS-CS and SDPR require PD for Cholesky decomposition. Fixed to use r_tol (1e-8) as the floor, ensuring strict positive-definiteness.

PRS-CS Python validation

Compared pecotmr C++ PRS-CS against the original Python PRScs (getian107/PRScs) on identical simulated data (n=500, p=50, 10 causal SNPs):

Metric Value
cor(Python beta, C++ beta) 0.995
Python cor(est, truth) 0.985
C++ cor(est, truth) 0.972
Sigma ratio (C++/Python) 1.014
Top-10 SNP overlap 8/10

Despite different RNG implementations (numpy vs std::mt19937), both converge to essentially the same posterior.

Other review findings (not fixed, lower priority)

  • gigrnd() clamps result to [0,1] — matches original PRS-CS which does psi[psi>1]=1 post-GIG
  • phi memory leak risk on exception in PRS-CS — low risk, would need unique_ptr
  • LD blocks re-read per lambda in lassosum C++ — performance only
  • exists() search path in otters_weights — works when package is loaded
  • Float precision in SDPR sample_assignment — matches original SSE single-precision

Test plan

  • PRS-CS validated against original Python (cor=0.995)
  • devtools::test() passes
  • R CMD check passes

🤖 Generated with Claude Code

gaow and others added 4 commits April 7, 2026 11:42
1. lassosum_rss() lambda reorder bug (R/regularized_regression.R):
   Self-assignment result$beta[, order] <- result$beta was a no-op.
   Fixed to use inverse permutation: result$beta[, order(order)].
   Same fix for conv, loss, fbeta vectors.

2. SDPR thread-safety documentation (src/sdpr_mcmc.cpp):
   Document the data race on shared std::mt19937 when n_threads > 1.
   This matches the original SDPR which has the same issue. Default
   n_threads=1 avoids the race.

3. check_ld eigenfix now produces PD not just PSD (R/LD.R):
   Set negative eigenvalues to r_tol (default 1e-8) instead of 0,
   ensuring Cholesky decomposition succeeds in PRS-CS/SDPR. Setting
   to exactly 0 gave PSD which is not sufficient for Cholesky.

Also verified via PRS-CS comparison:
- cor(Python PRScs, pecotmr C++) = 0.995 on beta estimates
- Both recover identical signal (cor with truth ~0.97-0.98)
- Sigma estimates match within 1.4%

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PRS-CS: Replace heap allocation (new/delete) of phi with stack
variables in both prs_cs_mcmc() and the Rcpp wrapper. The old code
leaked memory if an exception was thrown between new and delete
(e.g., Cholesky failure, user interrupt). Now uses phi_local on
the stack, eliminating the leak risk entirely.

lassosum: Document that min(fbeta) model selection is adequate
vs pseudovalidation. Empirical comparison over 20 random trials
shows no systematic advantage for either approach (4 vs 6 wins,
10 ties). The s-grid search provides the primary regularization;
lambda selection within each s has minimal impact.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. pval_acat: handle very small p-values (< 1e-15) using asymptotic
   approximation tan(pi*(0.5-p)) ~ 1/(pi*p) to avoid Inf/NaN overflow.

2. otters_weights: remove global b-clamping that biased all methods.
   Move cor>=1 safeguard into lassosum_rss_weights() where it belongs,
   since only lassosum requires correlations strictly in (-1,1).
   PRS-CS and SDPR handle their own regularization.

3. check_ld eigenfix: trigger on !is_pd (not just !is_psd), so
   matrices with eigenvalues exactly 0 are also repaired to be
   strictly positive definite for Cholesky decomposition.

4. lassosum C++: cache LD blocks in std::vector<arma::mat> before
   the lambda loop instead of re-copying from Rcpp::List on every
   iteration. Uses const reference in inner loop.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@gaow gaow merged commit 65a683f into main Apr 7, 2026
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant